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Abstract 


Practical safety-critical distributed systems must integrate safety critical and non-critical data 
in a common platform. Safety critical systems almost always consist of isochronous components 
that have synchronous or asynchronous interface with other components. Many of these systems 
also support a mix of synchronous and asynchronous interfaces. This report presents a study on 
the modeling and analysis of asynchronous, synchronous, and mixed synchronous/asynchronous 
systems. We build on the SAE Architecture Analysis and Design Language (AADL) to capture 
architectures for analysis. We present preliminary work targeted to capture mixed low- and high- 
criticality data, as well as real-time properties in a common Model of Computation (MoC). An 
abstract, but representative, test specimen system was created as the system to be modeled. 
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1 Introduction 


The documented work was performed under NASA Task Order NNL10AB32T, Validation And 
Verification of Safety-Critical Integrated Distributed Systems - Area 2. 

1.1 Scope 

This document is intended to satisfy the requirements for Deliverable 5.1.12 under Task 4. 1.3.1 
of this Task Order. It accompanies and provides the documentation for Deliverable 5.1.13, which 
includes models in electronic form. This modeling is work in progress and future changes can be 
expected. 

The description of Task 4. 1.3.1 says: 

The contractor shall develop advanced modeling and analysis capabilities to address 
emerging trends in integrated distributed systems architectures. In particular, the 
contractor shall define and model example mixed synchronous/asynchronous IMA ar- 
chitectures and applications. The contractor shall establish non-interference between 
time-triggered and asynchronous modes of communication. The contractor shall model 
and analyze fallback control from time-triggered operation to asynchronous mode to al- 
low graceful degradation using a common network infrastructure. The contractor shall 
perform dependability, performance, and interaction analysis on models of example 
systems. 

1.2 Motivation and Modeling 

Trends in Integrated Modular Avionics (jlMAl) show increasing integration of mixed criticality data. 
Practical, safety-critical systems typically manage critical and non-critical data within the same 
platform. An emerging trend is the use of mixed synchronous/asynchronous communication pat- 
terns that enable designers to categorize data flows to achieve the best utilization given depend- 
ability requirements. 

Researchers and developers of formal method tools would like to use real-world examples to 
test their research ideas and tool developments; however, creating such examples just for testing is 
prohibitively expensive. What is needed is an inexpensive way to share existing real-world designs 
in a format that can be used by formal methods tools. 

This document presents an approach for the modeling and analysis of mixed synchronous/asyn- 
chronous systems. We build on the Society of Automotive Engineers (ISAEl) Architecture Analysis & 
Design Language (lAADLll language to capture distributed, fault-tolerant architectures for analysis. 

One of the extensions to lAADLl is the “Annex E: Error Model Annex”, which is contained in 
the ISAEl Aerospace Standard AS5506/1 [T]. The purpose of this Annex is to define an AADL 
standard compliant extension to the AADL core language for the support of dependability and 
fault modeling. 

1.3 Tools 

The lAADLl Error Annex work described in this report is based on lAADLl vl. Open Source AADL 
Tool Environment (jOSATEi) vl.5.8, and Error Annex plug-in version 1.1.7. All are freely available 
at http: //www.aadl . info, The lAADLl model figures were created using the Error Detection Isola- 
tion Containment Types (lEDICTl) tool suite, available at http://www.wwtechnology.com, IEDICTI 
is based on the lAADLl v2 language. Other tools used include: 
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• The Symbolic Analysis Laboratory (ISALl) is a formal model checker tool developed by SRI 
International, available at http : //sal . csl . sri . com. 

• The Distributed Real-time Embedded Analysis Method (IDr.eam|) is an open-source tool and 
approach for performance estimation and real-time verification through Discrete Event Sim- 
ulations (jDESfe ). ID ream! can also automatically generate timed automata models for the 
UPPAAL (www.uppaal.com) and Verimag IF (http://www-if.imag.fr) model checkers. 
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2 Background 

2.1 Timing Terminology 

2.1.1 Standard Terminology 

When discussing “synchronous” and “asynchronous” systems, it helps to have well-established 
definitions for these terms. A good set of definitions for timing terminology can be found in 
Recommendation G.701 of the International Telecommunication Union (ITU), titled “Vocabulary 
of Digital Transmission and Multiplexing, and Pulse Code Modulation (PCM) Terms.” [2] We quote 
the relevant text below(non-English synonyms removed). Terms in square brackets are in common 
practice but their use is deprecated in the sense defined in the Recommendation. 

6014 isochronous 

The essential characteristic of a time-scale or a signal such that the time intervals be- 
tween consecutive significant instants either have the same duration or durations that 
are integral multiples of the shortest duration. 

NOTE - In practice, variations in the time intervals are constrained within specified limits. 

6015 anisochronous 

The essential characteristic of a time-scale or a signal such that the time intervals 
between consecutive significant instants do not necessarily have the same duration or 
durations that are integral multiples of the shortest duration. 

6016 synchronous [mesochronous] 

The essential characteristic of time-scales or signals such that their corresponding sig- 
nificant instants occur at precisely the same average rate. 

NOTE - The timing relationship between corresponding significant instants usually varies 
between specified limits. 

6017 homochronous 

The essential characteristic of time-scales or signals such that their corresponding sig- 
nificant instants have a constant, but uncontrolled, time relationship with each other. 

6018 non-synchronous [asynchronous/heterochronous] 

The essential characteristic of time-scales or signals such that their corresponding sig- 
nificant instants do not necessarily occur at the same average rate. 

6019 plesiochronous 

The essential characteristic of time-scales or signals such that their corresponding signif- 
icant instants occur at nominally the same rate, any variation in rate being constrained 
within specified limits. 

NOTES 

1 Two signals having the same nominal digit rate, but not stemming from the same clock 
or homochronous clocks, are usually plesiochronous. 

2 There is no limit to the time relationship between corresponding significant instants. 
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6020 heterochronous 

The essential characteristic of time-scales or signals such that their corresponding sig- 
nificant instants occur at different nominal rates. 

NOTES 

1 Two signals having different nominal digit rates, and not stemming from the same clock 
or from homochronous clocks are usually heterochronous. 

2 Terms 6014 to 6020 are based on the following Greek roots: 
iso = equal 

homo = same 
plesio = near 
hetero = different 

Most scholarly work and standards use definitions that are substantially similar to those quoted 
above. When wording in documents of these types differs from the wording of G.701, it is usually to 
make the definition easier to understand for readers unfamiliar with timing terminology. The cost 
of this accommodation is that the new definition may be ambiguous, imprecise/partially accurate, 
or even inapplicable to some situations. For example, such a definition (when read literally) may 
require absolutely zero jitter or may not apply to isochronous time-scales in which events are not 
equally spaced. 

2.1.2 Applicability to Avionics 

Isochronous components: Major components of avionics and other embedded digital electronic 

systems tend to have isochronous time-scales. These scales can be defined either explicitly or 
implicitly. Explicitly designed isochronous time-scales are designed to run at specific rates that 
are controlled by time signals derived from a clocking device such as a crystal oscillator. Implicit 
isochronous behavior emerges from components that aren’t specifically designed to run at any 
particular rate (or rates), but run at an isochronous rate because their design is based on a repetitive 
loop of operations (either a hardware state machine or software). The time it takes to do one loop 
of these operations creates the period for an isochronous timescale. Both explicit and implicit 
isochronous designs will exhibit some degree of jitter. 

Explicitly designed isochronous components can, and often do, have subcomponents that exe- 
cute at harmonic rates. That is, each subcomponent runs at a rate that is an integer multiple of 
any lower frequency rates used by the other subcomponents. Because the simplest of these integer 
multiples (in terms of implementation on digital hardware) is two, this is the ratio most often used. 
One of the most popular isochronous rate hierarchies is 10, 20, 40, and 80 Hz. 

Some components may have some isochronous subcomponents that are not synchronous with 
other isochronous subcomponents. Other components may have some subcomponents that are 
isochronous and other subcomponents that are anisochronous and non-synchronous with other 
subcomponents. 

Synchronous systems: Saying that a system is synchronous can mean that any signal traversing 

a path, from system input to system output, through several isochronous components crosses only 
synchronous boundaries between these isochronous components. Or, in a redundant system, it can 
mean the timing between the redundant components in a redundancy set are synchronous. Or, it 
can mean both. That is, there are two orthogonal dimensions to synchrony in a system. 

The differences between these meanings/dimension of “synchronous” can be described with 
reference to Figure[lJ The inputs to this example system are the Pilot Input and Sensor components. 
Signals produced by these components traverse the system in a processing “pipeline” using the 
following order: 
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1. Switch (SW) 

2. Processing Element (PE) 

3. Analog Control Electronics (ACE) 

In this figure, such signal flows can be seen as generally in the horizontal dimension. 

The figure also shows redundancy. Members of a redundancy set are separated vertically. 
Thus, in the context of this figure, we can talk about “horizontal synchronization” for the input- 
to-output pipeline and “vertical synchronization” for the redundant elements. Any such pipelined 
and redundant system can be represented similarly; therefore, the concept of two dimensions of 
synchronization applies to all such systems. 

Unless otherwise specified, when an avionics designer says a system is synchronous, this usually 
means that the system is synchronous in both dimensions. We use this convention in this document. 

For any given system, one mechanism can provide both dimensions of synchronization or each 
dimension could be provided by a separate mechanism. In a design using separate components for 
each dimension, a component may have difficulties when the horizontal and vertical synchroniza- 
tion mechanisms try to adjust the time-scale in opposite directions. While this problem is rarely 
discussed in the literature, a system design must resolve this conflict such that requirements for 
both dimensions of synchronization are satisfied simultaneously. 

Isochronous systems: The term “asynchronous” has been ambiguous since the very beginning 

of digital avionics system design. It is may be used to mean either anisochronous or non-synchronous 
or both, often without sufficient context to resolve the ambiguity. Per the G.701 recommendations, 
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the use of this term has been deprecated in favor of the unambiguous terms anisochronous or 
non-synchronous. However, because this term is so ingrained in digital avionics design and most 
digital avionics system designers are unfamiliar with the unambiguous terms, the (mis)use of this 
term is likely to continue for the foreseeable future. For this reason, we continue to use the term 
“asynchronous” where it may be expected by these designers and where its use is either unambiguous 
or its precise definition is unimportant. In other cases, the unambiguous terms are used. For this 
task, an “asynchronous system” means a system consisting of components that are each isochronous 
by itself and is not synchronous with any other component in either of the two dimensions discussed 
in Section [2.1.21 

Mixed systems: In the phrase “mixed synchronous/asynchronous IMA architectures” from this 

task’s description (see Section 0. the term ’’mixed” may have either of two definitions. It could 
mean that some interfaces are asynchronous (only) and some are synchronous (only) (in this case, 
asynchronous means non-synchronous). Or, “mixed” could mean that some interfaces and/or 
components are both asynchronous and synchronous. For example, a component could have two 
isochronous subcomponents that are non-synchronous with each other, and this component inter- 
faces with a similar component such that one subcomponent of each component is synchronous 
with the other via the interface while the other subcomponents are non-synchronous across that 
interface. 


Timing adjustments: When correcting a component’s time in order to maintain synchroniza- 

tion, the adjustments are rarely done at the crystal oscillator level. Some early work attempted 
correction at this level for fault tolerant clocks. But, because: (1) no analysis techniques have been 
developed that can establish a bound on the fault propagation effects for the analog techniques 
typically used, (2) the lowering cost of digital electronics has eliminated any cost advantage for 
these schemes, and (3) the difficulties in making such mechanisms work with the high-speed (~1 
GHz) clocks used in today’s processors, it has been many years since such mechanisms have been 
proposed. Instead, current techniques use some form of “sparse timebase” in which a high-speed 
clock is divided down to a lower speed clock via digital counters that can be dynamically adjusted 
to maintain the required rate and/or phasing of the lower speed clock. 


2.2 The Synchronous/ Asynchronous Debates 


Debates over the relative merits of synchronous vs. asynchronous systems have taken place for 
decades. In these debates, these terms typically refer to systems in which the components are 
explicitly or implicitly isochronous and the discussion is whether these components should be syn- 
chronous to some “global” timeline or if each of the interfaces between the isochronous components 
should be asynchronous. In rare cases, the definition of asynchronous truly means event driven. 
However, most safety-critical systems are control systems that approximate continuous control 
rather than discrete event controllers. 

A goal of this task is to shed some light on the synchronous vs. asynchronous debate and to 
provide useful comparison of their characteristics. In particular, we address some of the issues 


posed in Section 3.5 


3 Example System 

3.1 Selection of Test Specimens 

Test specimen selection or creation for formal methods research occurs along a trade-off spectrum. 
At one end of the spectrum, are problems that are so simplified that they bear little resemblance 
to reality, and the results of formal methods research using these specimens can be called into 
question. At the other end of the spectrum, one can design a full system exactly as it would be 
fielded. Such a system would have to be developed “from scratch” to avoid proprietary and ITAR 
issues (the results of this research must be available for open publication). Avoiding these issues 
makes the design of such a system actually more expensive than designing a real system, because 
proprietary legacy components cannot be used. The cost of developing such a system can easily 
run into tens of millions of dollars — -just to develop the test specimen before the research can begin. 
In addition, the complexity of a full system can obscure the research results (hiding the forest with 
leaves). Thus, useful test specimens are found in the part of the spectrum that provides the most 
meaningful results at reasonable cost. 

The most important component of a dependable embedded system is its data network. The data 
network provides the structure for a system’s architecture and also typically defines the system’s 
architecture. It also tends to be the component responsible for providing the highest level of fault 
containment — that is, it is the fault containment barrier of last resort. Thus, getting the data 
network correct is of utmost importance. 

Because of its importance, the data network should be the first consideration when selecting or 
creating a test specimen. Ideally, such a network should be an open standard that is accessible to 
anyone and widely used in safety-critical systems. Sadly, networks that fully support mixed syn- 
chronous/asynchronous traffic have not yet entered service, though some provide some extremely 
limited support. These networks are typically synchronous networks that provide some fixed allo- 
cations of bandwidth that can be used for “asynchronous” traffic. They include the ARINC 659 
SAFEbus |3, FlexRay [3, and TTCAN [5j- Of these, only ARINC 659 is a standard and has the 
fault tolerance to be used in safety-critical systems. However, the Medium Access Control (iMACj) 
mechanism used to arbitrate within its “asynchronous” allocations (called Master/Shadow win- 
dows) requires the synchronous portion of the protocol be working in order for the “asynchronous” 
arbitration to succeed. 

TTEthernet is the only network that purports to support mixed asynchronous/synchronous 
traffic, has the fault tolerance for safety-critical systems, and has attained standard status. (The 
SAE AS6802 Time- Triggered Ethernet Standard [6] was published on November 1, 2011). 

A newer version of the Braided Ring Availability Integrity Network (iBRAINl) [7J [Sj, 9j has the 
possibility of being a network that supports mixed asynchronous / synchronous traffic and has the 
fault tolerance for safety-critical systems; however, it is still under development. 

3.2 Example System Description 

3.2.1 The Example System’s Structure and Requirements 

Structure: Figure [l] shows the structure of an Example System we created; it is replicated as 

the base for Figures [2] through [8]). This structure consists of (from left to right in the figure) pilot 
inputs, Processing Elements (|PEfe l . TTEthernet switches (SW), Analog Control Electronics (|ACEl) . 
sensors, and actuators (Act) working cooperatively to control a plant (e.g., an aircraft’s attitude 
and speed). Most of the communication among the components occurs via TTEthernet links (blue 
lines) that connect all the components, except for the actuators, to TTEthernet switches. The only 
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traffic that does not travel via TTEthernet are the direct links (gray lines) that connect analog 
control electronics to actuators. Each lACEl drives just one of a triplex set of actuators. 

Each type of component is triplex. In addition, the processing elements, analog control elec- 
tronics, and all of the TTEthernet components (the switches and the interfaces to all the devices 
that communicate via TTEthernet) are fail-silent dual components (either self-checking pairs or 
command/monitor redundancy). We use “fail silent” rather than “fail stop” because the whole 
component may not have stopped operating; only one or more of its output ports either produced 
no output or produced an output that is obviously incorrect (to any observer). A proper subset 
of this “fail silent” failure mode is the inconsistent omission failure mode. We use “fail stop” for 
cases where a component has completely failed on all of its output ports and is not producing any 
output. 

The pilot input and actuators are not fail silent. Each actuator can be made inert by a signal 
from its associated IACEI that causes the actuator to go into a mode where the actuator provides no 
force or resistance to the other actuators. This signal occurs when commanded by an IACEI or when 
the two halves of the IACEI pair disagree. 

The IPEI compute an envelope-protected and stability-augmented control of the plant as long 
as failures do not prevent them from doing so. If all IPEI or synchronization fail, the IACEI assume 
fallback control. In the fallback mode, the IACEI run asynchronously and provide no envelope 
protection or stability augmentation. 

The lPEk inputs come from the triplex pilot inputs and the triplex sensors. Thc lPEt f outputs are 
sent to the lACEfe . which then use the outputs to control the actuators. If all IPEI or synchronization 
fail, the pilot inputs go to the IACEI instead of lPEI and sensor input is not needed. 


Requirements: The system needs to tolerate two uncorrelated faults with one possibly being 

Byzantine. The system also needs to minimize force fight between the actuators. A force fight 
occurs when multiple actuators try to move a flight surface to different positions, which results 
in the actuators exerting forces that, to some degree, oppose each other. The magnitude of the 
force fight is the difference in the forces applied by the actuators. To save power, the force fight 
should be as close to zero as possible. To prevent mechanical fatigue failure, the force fight must 
be less than 10% of the actuator input signal’s full range. To help meet this requirement, the IACEI 
limit the rate of change in signals going to the actuators to be no more than 5% of the full-scale 
magnitude for each execution cycle of a thread. The total latency from pilot inputs and sensors to 
the actuators must be less than 50 ms. These requirements must be met in all modes of system 
operation. 

This example system was chosen to have a diverse set of attributes found in real safety-critical 
systems, but simple enough to meet the specimen selection criteria given in Section 3.1 and to 


allow derivatives to be created easily (in order to test architectural variations, as described in 
Section 3.4). It contains the important attributes of many airplane flight control systems, but is 
not based on an actual flight control system. 


3.2.2 An Example System Behavior 

With the Example System structure and component descriptions from the previous Section |3.2.1| 
a number of different system behaviors can be created that satisfy the system requirements (also 
given in the previous section). This section describes one such set of behaviors. 

Asynchronous and Synchronous Modes of Operation: The system has two modes of oper- 

ation: asynchronous and synchronous. The asynchronous mode of operation is used as a minimal- 
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functionality fallback whenever the synchronization mechanism fails or all IPEfe fail. In the syn- 
chronous mode of operation, all components are synchronized to the TTEthernet timeline, which 
runs at a 20 Hz rate. In the asynchronous mode, all the components still run isochronously at 20 
Hz, but none of the components are synchronized to any of the other components. Because of the 
increased latency and jitter when running asynchronously, the components run at 80 Hz when in 
the asynchronous mode. 

Input Consistency: The fail-silent components (processing elements, analog control electronics, 

and all of the TTEthernet components) consist of a pair of subcomponents that operate such that 
each half of the pair does operations that are identical to the other half of the pair. The intention is 
that, given identical inputs to both halves of the pair, the identical processing will produce identical 
output (in the fault free case). This creates a derived requirement to ensure that the inputs to 
each of these pairs is identical between the two halves of the pair. This is done by having each 
pair exchange their inputs via a private communication path not shown in any of the figures for 
Section OH 

In addition to the two halves of each pair requiring identical inputs, each member of the pilot 
input redundancy set and each member of the lACEfe redundancy set need either all of their inputs 
to be identical amongst the members of a set or the members of the set must exchange values to 
prevent internal state divergence (e.g. the value of integrators, mode state, etc.). 

In the synchronous mode of operation, the IPEfe and lACEfe try to ensure that their inputs are 
identical by exchanging a status fram^j] amongst themselves. For the consistency exchange usually 
seen in the literature about the Byzantine Generals problem, the entire contents of each input are 
exchanged amongst all of the recipients. However, systems that use fail-silent pairs (such as ARINC 
659 SAFEbus and the TTEthernet used in our example system) can greatly reduce the required 
bandwidth by using “hierarchical Byzantine agreement”. In this scheme, the two halves of a pair 
exchange the entire contents for each of their inputs, to make sure both halves have bit-for-bit 
identical inputs. This can be done rather cheaply because the two halves of a pair are typically 
adjacent to each other, which allows low-cost high-speed communication hardware to be used on a 
private link that does not contend for the communication resources of the inter-component network. 
For any input that is not identical, the pair rejects the input. For agreement among multiple pairs, 
the pairs only need to exchange one bit that says whether or not they accepted or rejected an input. 
In our example system, the IPEfe exchange one frame with six bits of payload - three bits to say 
whether or not it received each of the three pilot inputs OK or not, and another three bits to say 
whether or not it received each of the three sensor inputs OK or not. 

If a IPEI fails to receive one of these status frames, it assumes that the status frame would have 
contained a status that matched its own. That is, any missing incoming status frame would have 
been identical to the status contents of the frame it, itself, had sent out. This covers the two most 
likely cases: 

(1) The sending IPEI had the same status, but its status frame failed somewhere between the sender 
and receiver (the most probable case). 

(2) The sender has failed stop (the second most probable case). For this case, it doesn’t matter 
what the status would have been. 

Similarly, the lACEfe exchange status amongst themselves using six bits, three for the IPEI inputs and 
three for the pilot inputs. This scheme tolerates at least one Byzantine fault. It also tolerates an 
arbitrary number of Byzantine faults as long as no two faults have identical or supporting symptoms 
(colluding faults). 

1 TTEthernet, like all Ethernets, uses “frame” where most avionics designers could use the term “message” 
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Given no other traffic interferes (because there is no other traffic or it is small frames of low 
priority such that interference is negligible) and the fact the main players are isochronous means 
we have bounded communication delay (and Byzantine agreement is possible). 

Because the lACEfc have to operate in the asynchronous mode, whereas the IPEfe do not, the lACEfc 
need to include some additional information in the exchanges amongst themselves. Without the 
aid of a synchronous timeline, each ACE cannot be certain which status bit belongs to which PE 
or pilot input frame, i.e. does a particular status belong to the nth — 1, nth, or the nth + 1 frame. 
So, each of these status bits must be accompanied with some form of sequence tag. Then, each 
ACE must implement some form of “waiting and matching section” to group status bits from the 
same frame together. In addition, the asynchrony among the ACEs means that their internal state 
can diverge even in the fault free case. This requires that the lACEfc do state exchanges whenever 
they’re running asynchronously. 


Data Flows: The system’s communication data flow, from inputs to output, can be scheduled 

in the synchronous case to occur in the following phases. 

First, the pilot inputs are sent to the PEs. Figure [2] shows the top of input being sent to the 
three PEs. At the same time, the second and third pilot inputs are sent to all three PEs. Only the 
first pilot input is shown in the figure in order to make the figure not too confusing. In addition to 
these pilot inputs going to the PEs, the same data is sent to the ACEs. This latter transmission is 
not needed for synchronous mode (which is a reason for the dotted lines in the figure). It is used 
to simplify the transition from synchronous mode to asynchronous mode. The transmission from 
the pilot inputs to the PEs uses TTEthernet’s time-triggered llTTj) class of service. This class of 
service minimizes latency and jitter in a synchronous system by restricting how early and how late 


12 





a frame can arrive with respect to its scheduled transmission time. At the same time, the pilot 
inputs are routed to the ACEs using TTEthernet’s rate-constrained (iRCll class of service. This 
class of service uses a “leaky bucket” type of protocol to limit how often a frame can be sent (to 
prevent bandwidth hogging). TTEthernet’s synchronization and timeline scheduling mechanisms 
do not need to be working in order for it to handle the RC class of service. 

The dataflow from each sensor to the PEs is handled in the same way as for the pilot inputs, 
except that there is no RC flow to the ACEs. One of the three sensor flows is shown in Figure [3j 



After the PEs receive the sensor and pilot inputs, they do the status exchanges for these input 
frames. One example of this status exchange flow is shown in Figure [4j Because TTEthernet is a 
store-and-forward network, it takes one frame transit time to go from the PEs to the switches (all 
done in parallel). At the end of this time, all of the switches have all of the PEs’ frames. Since 
each PE must receive a frame from the two other PEs, the transit time from the switches to the 
PEs must be two frame times. However, because these frames have only a few bits of status, or use 
the smallest Ethernet frame and the total transmission times is negligible. 

The next data transfer phase is from the PEs to the ACEs. Again, it takes one frame time for 
all the PEs to get all of their frames to all of the switches. Then, the links from the switches to the 
ACEs must carry three frames (because each ACE must get input from all three PEs). This make 
the total propagation delay from the PEs to the ACEs equal to four frame times. The dataflow 
from one PE to the three ACEs is shown in Figure [5j 

The next data transfer phase is for the input status exchange among the ACEs. While the 
dataflow patterns for this exchange are similar to that of the PEs, more data is exchanged because 
the frames that the ACEs exchange must carry the additional information needed for asynchronous 
operation. An example of the data flow from one ACE to the other two ACEs is shown in Figure |6j 
This transfer must use the RC traffic class because this dataflow must work for both the synchronous 
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Figure 4. PE to PE Dataflow 



Figure 5. PE to ACE Dataflow 
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and asynchronous modes of operation. 



The next communication phase is from the ACEs to their actuators. This communication is 
point-to-point from one ACE to one actuator, replicated three times. This communication involves 
no protocol, it is just direct wires. Figure [ 7 ] shows all three of these ACE to actuator flows. 

The preceding paragraphs capture all of the communication flows for the synchronous mode of 
operation. The asynchronous mode of operation reuses the flows shown in Figure [6] and Figure [7j 
The other flows for the synchronous case are not used in asynchronous operation. There is an 
additional flow for asynchronous case, which is shown in Figure [8j This flow replaces the first flow 
described for the synchronous case (the description associated with Figure [ 2 ]). 

3.3 Synchronous Timelines 

The timing relationships for TTEthernet communication are established by design-time schedulers. 
A simplistic timeline for this example system is shown in Figure [9j Much lower latency could be 
achieved by a more sophisticated scheduler. The simplistic timeline is shown here because it is 
easiest to understand. 

3.4 Possible Derivative Systems 

The Example System is easy to change to look at the implications of architectural variations. Some 
of the changes that could be made include the following. 

• replace TTEthernet as the example communication infrastructure, e.g. with a BRAIN having 
store-and-forward mechanisms for mixed synchronous and asynchronous operation 
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Figure 8. Pilot Input to ACE Backup Dataflow 
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Figure 9. Simplistic Timeline 


• make the system fully synchronous 

• make the system fully asynchronous 

• replace fail-silent components with simplex components 

• try different methods for preventing state divergence and mode inconsistency 

• the resource use of the pilot input threads non-negligible 

• add mechanisms for recovery of failed components (e.g. transient fault recovery) 

3.5 Questions to Be Answered 

A system designer faced with making decisions about designing a system that may be asynchronous, 
synchronous, or both needs to answer a number of questions. Some of the questions that formal 
methods and modeling can help answer are presented in this section. 

3.5.1 Resource Comparison 

How do the resources required for asynchronous operation compare to synchronous operation? 

In our example system, we will say that synchronous mode uses 20% of each PE’s resources, 
10% of each ACE’s, and a negligible amount of the pilot input. Further, to simplify analysis, we 
will say that each frame consumes 1 ms of link bandwidth. Switch buffers are a little harder to 
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quantify. One way of doing this is to say that one unit of buffer is the space required to hold 1 ms of 
traffic from one port. Note that TTEthernet in this example system uses store-and-forward (versus 
cut-through) . 

For communication bandwidth, we will use the numbers given in the inset within Figure [9j The 
message times, in milliseconds or microseconds, are the times for messages to traverse a single link 
from the sender’s buffer to the receiver’s buffer. 

Is our estimate correct that asynchronous operation needs to run at 80 Hz in order to meet the 
requirements of 50 ms end-to-end latency and no more than 10% force fight? 

3.5.2 Synchronous-to- Asynchronous Reversion 

Can we create an algorithm/mechanism that successfully reverts from synchronous operation to 
asynchronous operation under any failure scenario allowed in the system’s fault hypothesis? Can we 
prove that such an algorithm/mechanism is correct? Can we show that the transition period from 
synchronous to asynchronous causes no operation which violates any of the systems requirements? 

3.5.3 Noninterference 

Do any of the components or subcomponents that are asynchronous interfere with any of the 
components or subcomponents that are synchronous, or vice versa? Such interference comes from 
resource contention. In this example system, the possible resources that could be in contention 
are processor resources (e.g. CPU time, memory, etc.), link bandwidth, and switch buffer space. 
Processor resources exist in the pilot input components, thc lPEI . and the lACEl . 

The pilot input components include two threads. One thread is synchronous to the TTEthernet 
timeline. The other thread runs at the same isochronous rate as the first thread, but is not 
synchronized to TTEthernet or to the first thread, making the pilot input an example of a mixed 
asynchronous /synchronous component . 

The lACEl have two threads - one synchronous with TTEthernet and the other not synchronous 
with TTEthernet or the other thread. The difference between this threading model and the thread- 
ing model for the pilot input components is that the two lACEfe threads aren’t active at the same 
time. 
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4 AADL Modeling 

This section describes how we applied Society of Automotive Engineers lISAEjl Architecture Analysis 
& Design Language (|AADl1) to model the representative case studies described in Section [3| 

4.1 TTEthernet Case Study: Synchronous and Asynchronous Models 

The sequence of computation steps, data exchanges and additional exchanges for maintaining input 
consistency of data (described in Section [3]) that need to be modeled on a timeline is shown in 
Figure [To} It illustrates the data flows between and the computations at each of the different nodes 
in the system, namely Pilot Inputs (PI), Processing Elements (PEs), Analog Control Electronics 
(ACEs), Sensors and Actuators. The essential high-level requirement is that the closed 
loop system operates at 10Hz (100 ms period), i.e. round trip of 100 ms for Sensor to PE to 
ACE to Actuator and then back through the plant (actuator and aircraft dynamics, plus sensing 
lag) with at most a 50 ms budget for each direction. 

Our objective is to study the resource utilization and latency of this system with (i) data flows 
modeled as time-triggered (TT) traffic representing a Synchronous system which utilizes a global 
time base in the system vs (ii) data flows modeled as Rate-Constrained (RC) traffic representing 
an Asynchronous system which does not rely on a time base and thereby cannot align phase 
relationships between the different nodes on the system and hence have to account for worst case 
behavior. Note that the figure is not drawn to scale or representative of actual times for SYNC 
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50 ms 



Figure 11. Delay Components End-to-End for Worst Case Synchronous (TT) Model 


(TT) vs ASYNC (RC) cases. It has been staggered just for illustration purposes. 

Figure [Tl] shows the state chart capturing the individual components of the delay that contribute 
to the end-to-end latency for the synchronous (TT) model of the example system. The computation 
delay at every node in the system is represented by the parameters PI Compute Delay, SEN(sor) 
Compute Delay, PE Compute Delay, ACE Compute Delay and ACT(uator) Compute Delay. The 
Inputs, Outputs and Computations at PI, SEN, PE, ACE and ACT all are modeled as periodic 
events at 50 ms period. Additionally, for maintaining data consistency, PEs and ACEs exchange 
data/status amongst themselves to maintain state congruence between themselves i.e amongst the 
3 PEs and 3 ACEs. PEs have an Input Exchange with the other 2 PEs for maintaining consistency 
of data arriving from SEN and PI. Likewise, ACEs have both Input exchange with other two ACEs 
for data arriving from PEs and Output exchange with other two ACEs before sending data to ACT. 

Finally the parameters Tx Data Delay and Rx Data Delay in Figure 11 represent the fixed 
(constant) network delay through the network for transmission and reception respectively for trans- 
ferring data message flowing from source to destination. Note that the synchronous model assumes 
the network interface cards (NIC) and the nodes (hosts) are in phase, i.e. the network time base 
end-to-end, including clocks at NICs in the individual nodes, clocks in the underlying network 
switches and the clocks in the host processor in the individual nodes are all aligned with each 
other. Hence the time-triggered (TT) traffic ffow requires low/zero jitter to transfer from Source 
host, through the Source NIC and underlying network through different switches, all the way till 
the Receiver NIC and the Receiver Host. Since TT traffic is always completely scheduled through 
the network whereby traffic interference is explicitly mitigated out through scheduling, data delays 
are represented as “fixed” constant delays in the synchronous model, with no variability, which can 
then be calculated a priori and plugged into the model. Similarly Tx Status Delay and Rx Status 
Delay represent constant delays for “status” messages, which may be smaller message size compared 
to data messages, which are primarily used for PE Input Exchanges and ACE Input and Output 
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Figure 12. Delay Components End-to-End for Worst Case Asynchronous (RC) Model 


Exchanges to exchange the vector of valid messages received in order to maintain consistency. 

Similarly, Figure 12 shows the state chart capturing the individual components of the delay 
that contribute to the end-to-end latency for the asynchronous (RC) model of the example system. 
Comparing Figures 11 and 12 it can be observed that all the states and component delays in 
synchronous model are also present in the asynchronous model. There are two key differences 
which are described next. 


Firstly, in the asynchronous model, the delays through the network are no longer fixed, constant 
delays. They are inherently variable latencies (though deterministically bounded) with non-zero 
jitter. This is primarily because the RC traffic is never scheduled anywhere in the network. There- 
fore, the worst-case interference of equal or higher priority traffic must be taken into account while 
calculating the delays. Since the complete set of dataflows in the system or workload is well defined 
and known a priori, the worst case network queuing delay is typically calculated using techniques 
such as network calculus or other approaches. AADL vl currently provides latency plug-ins which 
helps specify the delay components for both globally completely synchronous and asynchronous sys- 
tems. But, AADL does not by itself provide the necessary queuing analysis techniques for worst 
case latency calculations and hence these need to be pre-specified in the AADL vl model. Hence, 
in order to model asynchronous systems, we introduce model parameters: Input2PE NW Worst 
Queue delay , PE2PE NW Worst Queue delay , PE2ACE NW Worst Queue delay and ACE2ACE 
NW Worst Queue delay. Each of these parameters represent the “worst-case” queuing delay for 
the RC dataflows PI to PE, PE to PE, PE to ACE and ACE to ACE respectively. In our model, 
we specify fixed, constant values for these parameters. The key assumption is that the system 
is analyzed through an off-line queuing analysis tool, upfront, and then the analysis results are 
applied to the AADL model. 

Secondly, in AADL vl (as far as we know), the latency analysis plug-in accounts for the “asyn- 
chronous” phase relation between the network bus and the tasks that run on processors in the model 
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Model Parameters 

Constant Value (ms) 

DataDelay (Tx Data Delay, Rx Data Delay) 

1.0 

StatusDelay (Tx Status Delay, Rx Status Delay) 

0.001 

PICompute Delay 

0.001 

SENComputeDelay 

0.001 

PEC omputeDelay 

10.0 

AC EC omputeDelay 

5.0 

ACTC omputeDelay 

0.001 

Input2PENWWorstQueueDelay 

1.002 

PE2PENWW or stQueueDelay 

2.001 

PE2ACENWWorstQueueDelay 

4.0 

AC E2AC ENWWor stQueueDelay 

5.0 

Period 

50.0 


Table 1. Model Parameter Constants 


by adding or penalizing a worst case one period to the end-to-end delay calculation. This is done to 
ensure all the communications on the network bus meet timeliness guarantees end-to-end because 
tasks on the processors are associated with local timeliness in an asynchronous mode instead of 
a global timeline and hence activities occurring in the same period boundary are not guaranteed. 
The net effect is that at every boundary (Host processor to Network NIC, Network NIC to Host 
processor) where there is potentially an asynchronous crossing of time bases, one period delay must 
be added. We also understand that there may be a more finer granularity mechanisms specifica- 
tions for phasing relationships in the AADL v2 model (and the latency plug-ins) for asynchronous 
systems. At the time of this report in Phase I, we have not investigated AADL v2 and so are 
limited to AADL vl asynchronous mechanisms provided. 

Due to the above limitation in AADL vl, we are limited to adding one period to every asyn- 
chronous boundary crossing. The different asynchronous boundaries in our systems are at the 
different receiver at every one of the 5 stages, namely: PE Input Exchange stage (when data is 
received from Pis and/or SENs), PE to ACE (when data is received from PEs at the ACE), ACE 
Input Exchange , ACE Output Exchange and ACE to ACT (sending command to actuator). Adding 
one period (50 ms) at each of these boundaries would make the end-to-end closed loop infeasible 
because of our requirement that system needs to meet one-way latency of 50 ms or round-trip 
latency of 100 ms. In order to make asynchronous model feasible, we over-sample at each of these 
stages with parametersiiVl, N2, A3, A 4 and A5 where each Ni E 1, 2, 3..,. As shown in f igure |T2| 
this oversampling approach implies the periods are shortened at the corresponding receivers by 
W>’ W5’ 773 an d WE thereby for some Ni setting the asynchronous models will results in a feasi- 
ble solution of meeting one way latency of < 50 ms. Note that oversampling results in increased 
processor utilization ( processor overhead) and network bandwidth reservation (bandwidth overhead). 


4.2 


Constants for Model Parameters and Analytical Formulation for Worst 
Case Latency 


The constant values used for the different model parameters discussed in previous Section 4.1 


are 


listed in Table [TJ These values were chosen so as to be reasonably representative of the actual 
system. The worst case queue delays given by Input2PE NW Worst Queue delay , PE2PE NW 
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Worst Queue delay , PE2ACE NW Worst Queue delay and ACE2ACE NW Worst Queue delay 
were appropriately calculated for the different traffic seen by the underlying network and taking a 
worst case arrival of those traffic as seen by the network. 

Based on the state diagrams listed in Figures [TTJ [12J the worst case latencies can be calculated 
explicitly in a straight forward manner and is listed in Equations [lj [2] below respectively. 


Latency^ r c stcase =Max(PIComputeDelay + 2 * DataDelay, SENComputeDelay + 2 * DataDelay ) 
+ 3 * StatusDelay + P EC omputeDelay + 4 * DataDelay + 3 * StatusDelay 
+ AC EC omputeDelay + 3 * StatusDelay + 2 * Datadelay + ACT Compute Del ay 
=Max(PIC omputeDelay , SENComputeDelay) + PEC omputeDelay 

+ AC EC omputeDelay + ACTC omputeDelay + 9 * Statusdelay + 8 * DataDelay 

(1) 

Latency^ 7 n s t case =M ax(P I C omputeDelay , S E NC omputeDelay) + PEC omputeDelay 

+ AC EC omputeDelay + ACTC omputeDelay + 9 * Statusdelay + 8 * DataDelay 
+ Input2PENWWorstQueueDelay + PE2PENWWorstQueueDelay 
+ PE2ACENWWorstQueueDelay + 2* AC E2AC ENWWorstQueueDelay 
Period Period Period Period Period 

+ Nl + N2 + N3 + NA + N5 

(2) 

Plugging the model constants listed in Table [l] into Equations EE , we get the Worst case 
End-to-End Latency /Delay shown in Table [2j While Synchronous model using TT traffic meets 
latency requirement of 50 ms, Asynchronous model using RC traffic must be oversampled 26 
times in order to meet the 50 ms latency requirement. 

4.3 Instantiating TTEthernet Case Study in AADL 

The TTEthernet case study is a voted high-integrity architecture; it uses triple redundancy for all 
functional components. 

Figure [17] shows the high-level overview of the TTEthernet-based case study. It consists of 3 
Pilot Input, EH Switch. IACEI Sensor, and Actuator components. We describe these components 
in more detail below. 


Pilot Input: We have captured the Pilot Input component as an IAADLI system, as shown 


in Figure 13 The Pilot Input component consists of a Network Interface Controller (|NIC|) . a 
processor, and a memory all connected through an on-chip interconnect. The processor executes a 
single thread implementing the Pilot Input functionality in software. 

The Pilot Input component is a self-checking pair. Both elements in the pair have the same 


configuration, therefore we have omitted the second element of the pair from Figure 17 


Processing Element: The IpeI component, illustrated in Figure [T4[ performs computation based 

on the Pilot Input and Sensor readings to derive the optimal input values for the lACEk . The IpeI 
consists of a lNICl a processor, and a memory all connected through an on-chip interconnect, and a 
single thread running on the processor, executing the IpeI functionality in software. 
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Scenario 

End-to-End Latency (ms) 
(Need to be < 50ms 
in order to be feasible) 

Synchronous Model 

23.011 

Asynchronous Model 
(oversampling period 50ms with 
N1,N2,N3,NA,N5 below) 


N 1 N2 N3 JV4 N5 


11111 

290.014 

2 2 2 2 2 

165.014 

3 3 3 3 3 

123.3473333 

10 10 10 10 10 

65.014 

15 15 15 15 15 

56.68066667 

20 20 20 20 20 

52.514 

25 25 25 25 25 

50.014 

26 26 26 26 26 

49.62938462 


Table 2. Theoretical Results from Analytical Computation 
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Figure 13. Pilot Input Component 


Switch: The Switch component models the TTEthernet switch, managing communication over 

the network. We captured the switch as an IAADLI device, with multiple data ports corresponding 
to connections to INICfe connected to the switch. 

Analog Control Electronics: The IaceI component models, illustrated in Figure [15] , the elec- 

tronics in charge of driving the Actuators based on signals received from thelPEb. The IACEI consists 
of a INICl a processor, and a memory all connected through an on-chip interconnect, and a single 
thread running on the processor, executing the IACEI functionality in software. 
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Figure 14. Processing Element Component 
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Figure 15. Analog Control Electronics Component 


Sensor: The sensors provide data that is used to close the control feedback loop. Sensor data is 

fed back to the IPEt . where it is used to augment the Pilot Input to the lACEk . We have modeled 
the Sensor as an lAADLl system consisting of a sensor device, and a lNICl The sensor AADL model 


is presented in Figure 16 


Actuator: The actuators were captured as lAADLl devices. These model the hydro-mechanical 

units and servos controlling wing surfaces in the aircraft. 


4.4 Modeling Synchrony and Asynchrony using AADL 

We have created three versions of the TTEthernet example; a synchronous model, an asynchronous 
model, and a direct mode asynchronous model. 
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Figure 16. Sensor Component 


Synchronous model: The synchronous model is a good fit for lAADLl modeling. Global synchrony 

is assumed bv lAADLl as well as its latency analysis plug-in. All threads are dispatched periodically, 
and communication delays can be expressed explicitly. 


Asynchronous model: Initially, we had trouble expressing the asynchronous model, as the key 

assumption of global synchrony no longer holds. IAADLI has no concept of local clocks, or branching 
time as used in Computation Tree Logic ([CTLll [10] , thus all clock drifts must be expressed in terms 
of a global, linear notion of time. We have achieved this by introducing “boundary” delays between 
the network and components, where we add up the worst case latencies in the model. 


Direct asynchronous mode: The third option we consider is a “direct mode” open-loop flight 

control architecture. In this configuration, the sensors and IPEfe are disabled, thus there is no 
feedback, and all control surfaces are controlled directly by the pilot. When synchronization is lost, 
a flight control system can quickly revert to this direct mode. 

Compared to the asynchronous feedback control architecture, latencies will be lower in the direct 
mode control, and thus performance/latency constraints are easier to meet. The open- loop control, 
however, requires an airplane that is inherently stable - such as commercial transport airplanes - 
and is not able to control aircraft with negative or relaxed static stability - such as the F-16. 

4.5 Real-time Properties: Schedulability Analysis and Latencies 

In IAADLI real-time properties and constraints are specified in multiple places. IAADLI was designed 
for schedulability analysis from its early stages. One can assign periods, deadlines, and dispatch 
policies to threads. Thus, one can quickly apply classic scheduling theory to analyze whether the 
IAADLI system is schedulable. This abstraction closely follows common assumptions of scheduling 
theory, such as periodic tasks, triggered independently of task dependencies, as specified by some 
rate. 

Latency analysis in IAADLI builds on a different approach, added as an experimental feature in 
IAADLI vl. Latencies in IAADLI can be assigned to flows, connections, and buses. End-to-end flows can 
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Figure 17. TTEthernet Example Modeled using AADL 





then be specified by describing how multiple flows are connected to form a data flow path through 
the lAADLl model. The latency analysis can then calculate the end-to-end latency by summing the 
latency numbers along the path. 

The schedulability analysis in lAADLl currently is independent of the latency analysis, and the 
latency analysis is independent of the schedulability analysis. Thus, changing latencies along a flow 
has no impact on schedulability analysis, and vice versa. Ideally, one could apply methods that 
automatically derive jitters for tasks as influenced by latency analysis, similar to how it is described 
in pTj . 

The modeling of asynchronous threads in lAADLl is another challenge. Global synchrony is a 
key assumption in lAADLl The dispatch policy of a thread can be aperiodic, however it is not clear 
what triggers the thread’s execution in that case. A thread may have an incoming data flow, with 
an associated latency. However, the semantics is not fully specified, and thus up to interpretation. 
One might assume event semantics for flows, saying that a thread executes whenever it receives 
data through an incoming flow, as specified by latencies. 

Moreover, periodic threads can also be asynchronous, as they may be triggered by local clocks, 
that can potentially drift from each other. Since lAADLl assumes a globally synchronous system, 
however, specifying the thread dispatch policy to be periodic would imply a globally synchronous 
system. 

As the examples above show, lAADLl timed semantics are ambiguous, and change depending on 
the context. This presents a problem when modeling mixed synchronous/asynchronous systems, as 
one needs to think over potentially conflicting real-time constraints. Moreover, for latency analysis 
jitters must be manually calculated, and specified in the models by the designers specifically. Once 
all such parameters are manually calculated, adding up the latency numbers is little extra value 
provided bv I AADLl By bridging the gap between the periodic task model and latency analysis, the 
burden of calculating jitters could be automated. 

4.6 AADL State of the Art 

This section describes the state of the art in lAADLl as it relates to the fault and real-time analysis 
performed in this study. We hope that the discussion of the current lAADLl version will motivate a 
discussion, that may potentially benefit future versions of lAADLl 

Tasks: Threads are tasks. They support three interaction paradigms: directional flow of infor- 

mation through ports, access of a shared logical resource via data access, and request for services 
via subprogram access. In the case of directional flow of information, there are three variants: 
event communication, message communication with queuing, and sampling communication with- 
out queuing. These communication paradigms can be combined with the different thread dispatch 
semantics of time-triggered dispatch (periodic) and data-driven dispatch (aperiodic or sporadic). 
For example, we can represent a health monitoring system that periodically polls its alarm event 
queue rather than responding to each individual alarm arrival with a dispatch. 

Asynchrony: According to [12], the original lAADLl standard defines thread dispatch and commu- 

nication semantics in terms of a global clock. The Normal System Operation section of the AADL 
Standard|l3] mentions that asynchronous system semantics are allowed. This is done by intro- 
ducing additional properties and a device type to represent the clocks. lAADLl Version 2 explicitly 
introduces the notion of multiple clocks. These are referred to as synchronization domains expressed 
by a predeclared device type to represent the clocks. In the model, a property Reference .Time can 
identify the local clock. 
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A globally synchronous system is relevant for communication between periodic threads. In 
the case of aperiodic message or event communication (event data ports) the arrival of the mes- 
sage/event triggers the dispatch. Thus, dispatching occurs independent of clocks; it may be queued 
if the thread is still active with a previous dispatch. 

For periodic threads there are immediate, delayed, and in I A ADLl Version 2 sampling port connec- 
tions on data ports. Immediate and delayed connections require time-based coordination of thread 
dispatch. For immediate connections the recipient execution start is delayed until the source thread 
has completed its execution. In case of delayed connections, the communication must delay the 
delivery of the data until the next frame, possibly by double buffering it. In other words, time-based 
coordination is necessary to enforce all of the semantics. 

Sampling connections indicate that the recipient samples the input at its rate independent of 
the thread dispatch and completion. Allowing these connections to operate in both a synchronous 
and asynchronous system. 

4.7 Results and Conclusion 

The modeling work is still in progress, thus, the conclusions and inferences drawn below are a 
“snapshot in time”. These will change as our knowledge of lAADLl modeling techniques improve and 
we get feedback from the lAADLl community. We have joined the lAADLl committee and have ongoing 
interactions with the principals. 

So far, the modeling work has yielded a number of lessons. The difficulty in expressing protocol- 
centric failure behavior indicates that if the long-term goal is to use I A ADD models as key repositories 
for generating system dependability attributes, more work needs to be done. However, the promise 
of model-driven dependability analysis, with its potential for automated generation of system fault- 
trees and Failure Mode and Effects Analysis (iFMEAjl s. is very exciting. As the technology matures, 
it may allow for more systematic architectural trade-offs and safe and informed architectural opti- 
mization. 

As with most model development, the ability to do system exploration of the modeled domain 
is a key benefit. Similarly, the ability to capture the rationale of design and the assumptions that 
underpin the model is of keen interest. In this regard, we found the application of the simple fault 
taxonomy and naming convention to be very beneficial and effective. Exploring the error modes at 
multiple architectural layers allowed for a more systematic examination of conscience, making the 
modeler reconsider the potential failure contributions at each layer. 

An interesting side-effect of this naming convention is that such a semi-mechanical examination 
“checklist” yielded a potential error model state explosion as the taxonomy of error failure modes 
were applied to components we previously considered to be simple. For example, the modeling 
of a driver required decomposition into smaller sub-components to facilitate efficient modeling of 
concurrent failure manifestations. 

Attempting to map all ingress and egress error manifestations to a single state machine rapidly 
became intractable, and a hierarchical decomposition of the driver was required to separate potential 
concurrent error contributions. For example, a single integrated-circuit quad driver yielded an error 
model with eight internal error state machines, as separate ingress and egress error manifestations 
were captured. The totality of these eight state machines were much less complex than a single 2 8 
input state machine. From our experiences with the driver components, we feel that a generalized 
method to guide hierarchical decomposition may be beneficial (and potentially critical) if resulting 
models are to remain tractable for analysis. 

Our work so far has been performed largely bottom- up, focusing on connectivity and protocol 
layers. We feel that a similar methodology would also be beneficial if applied top-down — where 
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application and software developers also declare the fault model for the expected communication 
exchanges using a similar taxonomy. Formalizing the expectations of each layer may then provide 
for greater application and platform re-uses; and, in the longer term, automate consistency checking 
of application requirements with the underlying platform and communication layer guarantees. 

A key challenge we identified in modeling error propagation and mitigation is the need to 
compose multiple, potentially heterogeneous models of computation in order to express the behavior 
of both the analyzed system and error propagations. The current IAADLI Error Annex relies on a 
probabilistic automata context, whereas. IAADLI itself is defined using data flow-like semantics. The 
composition of such models must be captured for formal analysis of error propagation in IBRAIN1 
Moreover, the behavior of the IBRAlNl nodes themselves must also be captured, potentially through 
the IAADLI Behavioral Annex, Finite State Machines (iFSMfe ). or other formal languages. 

Furthermore, it may become practically impossible to capture different aspects and multiple 
levels of abstraction in the same formal model. Reusing verification results from the formal verifi- 
cation of protocol functionality may help to “guide” the error propagation analysis. Probabilistic 
analysis lends itself especially well as a semantic domain for integrating analysis results — this may 
be a role for SRI’s Evidential Tool Bus (IetbI) integration. 

To produce truly re-usable models, a better layering methodology needs to be developed. We 
feel that a weakness of the IAADLI modeling approach is that a driver model must have knowledge 
of the upper protocol layers. This is illustrated in the Time-triggered Protocol ( |TTP/cp model- 
ing, where protocol-centric failures (i.e. , those that were a function of semantic content or timing) 
required declarations and pass-through mappings within the driver component for protocol error 
propagations, even though an actual driver would have no knowledge of protocol data or time 
semantics. From a re-use perspective, such mappings introduce semantic layer pollution that pre- 
clude component reuse. In the ideal case, a layering hierarchy should be developed to allow for 
greater abstraction and pass-through of higher-layer error events. This would allow a driver model 
to remain agnostic to specific system target instantiations. 

One difficulty in developing models without an available execution and analysis environment is 
completeness. IAADLI states that an error specification is erroneous if all input propagations are not 
captured within a guard. Although layering the models more effectively may help here by allowing 
events from different layers to pass-through states without explicitly specifying them, it may also 
complicate the assessment of completeness. 

The question of completeness is further compounded bv lAADLl s strict ordering of guard actions. 
In lAADLl the order of guard conditions is important, with the first matching guard taking precedence 
over others below it in lexical order. Although we welcome the rigors of the possible specification, 
we also believe that this is an area where formal model translation, simulation, and analysis (model 
checking) will be greatly beneficial to the modeler, ensuring that the intended behavior is what the 
modeler intended. 

On a similar note, an issue already under discussion in the lAADLl working committee is the ability 
of a component to query the internal state of a component it is connected to. Using such coupling, 
it is possible to completely circumvent the IAADLI Error Annex error propagation mechanism and 
to code error transitions from coupled state knowledge. 

For longer term re-use, we feel it is preferable to model events where the only permitted coupling 
is between components, with the possible exception of signaling an “error-free” state. The models 
in this report have adhered to this restriction. 

In relation to the use of error probabilities, we have two findings. When discussing the more 
esoteric failure modes, determining the probabilities for error manifestations is sometimes closer to 
art than science. In an initial system model (where detailed reliability models and evidence are not 
available), complex failure modes are often estimated by simple rules of thumb, e.g., an expectation 
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of 1% of permanent failures result in babbling. 

Currently, such assumptions can be modeled by adding intermediate states to the error model. 
However, we feel that the ability to express an event occurrence as a function of another event 
occurrence may be beneficial. For example, using something like: 

occurrence/ ail x = occurrence/ ally * 0.01 

which means that the probability of X occurring is 1% of the probability of another error event 
(Y). In the early states of model development, this may not require all states to sum-up to one, 
but we need to conduct more informal explorations to test the sensitivity to such assumptions. 

A similar concept is required for hierarchical composition. By decomposing the state into 
separate automata, we do not want to infer that the states manifest independently. Instead, 
we want the probability numbers and distributions to express the failure rate of the hierarchical 
concurrent child automata. To express such issues in a probabilistic reasoning framework, methods 
must be developed that equalize probabilities as “weights,” rather than treating them as hard 
numbers. 
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5 Related Work 


Methods to analyze flow latency in Architecture Analysis & Design Language llAADLj) models were 
presented in [12] . the results of which were implemented in a flow latency analysis plug-in. A 
key assumption used throughout the study is that the execution platform is globally synchronous. 
Therefore, this method cannot inherently capture asynchronous communication, and reduces it to 
finding sampling rates for asynchronous events. This approach seems too restrictive for the analysis 
of asynchronous high-integrity systems. Moreover, in asynchronous systems the global Worst Case 
Execution Time (IWCETl) may not be the product of local IWCETI times. This, however, appears 
difficult to capture with the assumptions used. 

An approach for system-level co-simulation of avionics systems using the Polychrony tool is 
shown in p3]. Polychrony is a tool based on the Signal synchronous language. Synchronous 
languages m build on the key assumptions of global synchrony, deterministic concurrency, and 
zero-time computation time, making them suitable for the scalable analysis of synchronous systems. 
As their name suggests, however, synchronous languages cannot directly express asynchronous 
communication. 

Ocarina m is a tool environment developed in Ada for I A ADO model analysis and code genera- 
tion. Ocarina uses the Cheddar tool for real-time analysis, and integrates the PolyORB middleware 
for code generation. The focus of Ocarina is code generation and simulations. In contrast, the 
methods described in this study focus on real-time and fault-tolerance analysis. 

A dependability modeling framework based on IAADLI and Generalized Stochastic Petri Nets 
(iGSPNfe ) is presented in |T7]. The authors propose a translation of IAADLI Error Annex models 
to IGSPNI models in order to analyze reliability and availability. The work proposed in this study 
adopts the high-level approach of mapping IAADLI models to formal Model of Computations (jMoCfc ) . 
However, we chose Finite State Machine llFSMjl as the underlying IMoCI for this work to overcome 
issues with mixed low- and high-integrity message propagation. 

An approach to analyze real-time properties of IAADLI models using the Unified Modeling Lan- 
guage (iUMLl) Modeling and Analysis of Real-Time and Embedded Systems (iMARTEj) profile is 
presented in [IS]. Activity diagrams are used to capture asynchronous message passing between 
threads. Analytical results are then presented to obtain the end-to-end latency of the models, but 
the focus of the paper is to show that I AADLl models can be captured using lUMPlMARTEl A method 
to analyze “immediate” and “delayed” data communications in IAADLI using IIJMLI IMARTEI is de- 
scribed in [19] . The authors define a clock constraint language that is used for manual calculations. 
In this report, we chose to focus on analyzing IAADLI models directly. Moreover, we focused on an 
approach that is amenable to automated analysis using Symbolic Analysis Laboratory (ISALl) and 
the open-source Distributed Real-time Embedded Analysis Method (ID ream!) tool. 

A method to use IAADLI as a specification for the performance evaluation of real-time archi- 
tectures is presented in [20]. The Cheddar tool is used for the analytical and simulation-based 
evaluation of various communication patterns. The approach described in this paper builds on a 
task graph model, and is capable of achieving exhaustive state space search as described in Tl j . 
Moreover, the token-based extensions allow the capturing of fault tolerance properties. 

Schedulability analysis of IAADLI models using the Algebra of Communicating Shared Resources 
(lACSR.I) is presented in [22]. IACSR.I can be applied to perform many different scheduling algorithms, 
including Earliest Deadline First (Iedf|) . Least Laxity First (IllfI) . etc. I AC SRI is also able to capture 
aperiodic threads, and can be analyzed using the VERSA tool [23] . The approach used in this 
paper is similar, as the real-time analysis is translated to reachability analysis in both cases. The 
ISALl models developed, however, can also be used to prove fault tolerance properties. 
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6 Conclusion 


We created an example system that, while being simple and abstract, has all the characteristics 
and flexibility needed to do analysis and modelling of asynchronous, synchronous, and mixed syn- 
chronous/asynchronous systems. The TTEthernet communication network used in this example 
system has the capability of supporting each of these timing paradigms. From this example system, 
we created AADL models and performed timing analysis. 

In comparing two different implementations of this example system, one asynchronous and one 
synchronous, we found that the asynchronous model (using TTEthernet RC traffic) required an 
oversampling of 26 times that of the synchronous system in order to meet the 50 ms worst-case 
latency requirement established for this example system. This leads to a commensurate increase in 
both processor and bandwidth overhead. In this analysis, we found that Architecture Analysis & 
Design Language (jAADLl) did not provide the necessary queuing analysis techniques for worst-case 
latency calculations and these needed to be pre-specified parameters in the IAADLI vl model, using 
an off-line queuing analysis tool. 

We found that the schedulability analysis in IAADLI currently is independent of the latency 
analysis and the latency analysis is independent of the schedulability analysis. Thus, changing 
latencies along a flow has no impact on schedulability analysis, and vice versa. Ideally, one would 
apply methods that automatically derive jitters for tasks as influenced by latency analysis. 

The modeling of asynchronous threads in IAADLI is another challenge. Global synchrony is a key 
assumption in AADL. The dispatch policy of a thread can be aperiodic, however it is not clear what 
triggers the thread’s execution in that case. Periodic threads can also be asynchronous, as they 
may be triggered by local clocks that can potentially drift from each other. Since AADL assumes 
a globally synchronous system, specifying the thread dispatch policy to be periodic would imply a 
globally synchronous system. As AADL’s semantics are ambiguous and change depending on the 
context, this presents a problem when modeling mixed synchronous/asynchronous systems. One 
needs to think over potentially conflicting real-time constraints. Moreover, for latency analysis, 
jitters must be manually calculated and specified in the models by the designers manually. Once 
all such parameters are manually calculated, adding up the latency numbers via AADL provides 
little extra value. By bridging the gap between the periodic task model and latency analysis, the 
burden of calculating jitters could be automated. 

We gathered a number of other lessons about IAADLI from modeling communications protocols. 
The difficulty in expressing protocol-centric failure behavior indicates that if the long-term goal is 
to use IAADLI models as key repositories for system dependability attributes, more work needs to 
be done to mature AADL’s capability and ease-of-use in this area. As the technology matures, 
analysis tools linked to IAADLI mav allow for more systematic architectural trade-offs and safe and 
informed architectural optimization. 

We found the application of the simple fault taxonomy and naming convention we developed to 
be very beneficial and effective in trying to model failures in and/or propagated by communication 
mechanisms. Exploring the error modes at multiple architectural layers allowed for a more system- 
atic examination of conscience, making the modeler reconsider the potential failure contributions 
at each layer. 

However, attempting to map all ingress and egress error manifestations of a communication 
channel to a single state machine per the IAADLI Error Annex rapidly became intractable and a 
hierarchical decomposition of the communication’s media driver was required to separate poten- 
tial concurrent error contributions. From our experiences with the driver components, we feel 
that a generalized method to guide hierarchical decomposition may be beneficial (and potentially 
critical) if resulting models are to remain tractable for analysis. Furthermore, it may become 
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practically impossible to capture different aspects and multiple levels of abstraction in the same 
formal model. Reusing verification results from the formal verification of protocol functionality 
may help to “guide” the error propagation analysis. Probabilistic analysis lends itself especially 
well as a semantic domain for integrating analysis results — this may be a role for SRI’s Evidential 
Tool Bus (lETBl) integration. 

When examining rare and complex failure modes, determining the probabilities for error man- 
ifestations is sometimes closer to art than science. In an initial system model (where detailed 
reliability models and evidence are not available), complex failure modes are often estimated by 
simple rules of thumb, e.g., an expectation of 1% of permanent failures result in babbling. We feel 
that the ability to express an event occurrence as a function of another event occurrence may be 
beneficial. For example, using something like: 

occurrence fail x = occurrence/ ally * 0.01 

which means that the probability of X occurring is 1% of the probability of another error event 

O'). 

A similar concept is required for hierarchical composition. By decomposing the state into 
separate automata, we do not want to infer that the states manifest independently. Instead, 
we want the probability numbers and distributions to express the failure rate of the hierarchical 
concurrent child automata. To express such issues in a probabilistic reasoning framework, methods 
must be developed that equalize probabilities as “weights,” rather than treating them as hard 
numbers. 
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Appendix A 


Acronyms and Initialisms 

AADL Architecture Analysis & Design Language 

ACE Analog Control Electronics 

ACSR Algebra of Communicating Shared Resources 

BRAIN Braided Ring Availability Integrity Network 

CTL Computation Tree Logic 

DES Discrete Event Simulation 

Dream Distributed Real-time Embedded Analysis Method 

EDICT Error Detection Isolation Containment Types 

EDF Earliest Deadline First 

ETB Evidential Tool Bus 

FMEA Failure Mode and Effects Analysis 

FSM Finite State Machine 

GSPN Generalized Stochastic Petri Net 

IMA Integrated Modular Avionics 

LLF Least Laxity First 

MAC Medium Access Control 

MARTE Modeling and Analysis of Real-Time and Embedded Systems 

MoC Model of Computation 

NIC Network Interface Controller 

OSATE Open Source AADL Tool Environment 

PE Processing Element 

RC rate-constrained 

SAE Society of Automotive Engineers 

SAL Symbolic Analysis Laboratory 

TT time-triggered 

TTP/C Time-triggered Protocol 

UML Unified Modeling Language 

WCET Worst Case Execution Time 
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Appendix B 


DREAM Model of TTEthernet Case Study 


The AFCS-Distributed Systems project has been established on NASAs DASH link to support 
public dissemination of the models and results of this program. The URL https://c3.nasa.gov/dashlink/ 
will take the user to the Home Page of DASH/m&. The user can access the site by hovering over 
the RESEARCH AREAS pull-down list and right clicking on Verification and Validation. Scroll 
down to AFCS Distributed Systems and right click the project. Right click Distributed Real-time 
Embedded Analysis Method (DREAM) under Popular Resources. The DREAM model is contained 
in the hie ttethernet- dream, model.txt 
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